June 29, 2016

Probationer Population

  • Mostly male
  • Mostly not murderers, but dangerous

LA County

Berk's study (Variable Importance Plot)

Feature selection

##                 freqRatio percentUnique zeroVar   nzv
## BOOK DATE        1.000000    0.74877201   FALSE FALSE
## Murder         129.421875    0.01198035   FALSE  TRUE
## Age              1.053819    0.38936145   FALSE FALSE
## White            4.662822    0.01198035   FALSE FALSE
## Male             8.717113    0.01198035   FALSE FALSE
## ZIP              1.273859    3.73187972   FALSE FALSE
## Total_Pop        1.273859    3.54618426   FALSE FALSE
## Black_Pop        1.273859    3.28860669   FALSE FALSE
## Prop_Black       1.273859    3.42638074   FALSE FALSE
## Income           1.273859    3.51623338   FALSE FALSE
## PRIMARY CHARGE   1.296763    2.59374626   FALSE FALSE
## Gang             1.883745    0.01198035   FALSE FALSE
## RegisterSO      50.684211    0.01198035   FALSE  TRUE
## ViolentCase     11.704718    0.01198035   FALSE FALSE
## WeaponCase     104.658228    0.01198035   FALSE  TRUE
## DrugCase       537.516129    0.01198035   FALSE  TRUE
## MH               3.310354    0.01198035   FALSE FALSE
## Zip_Present      7.159335    0.01198035   FALSE FALSE

Model 1

fit <- randomForest(Murder ~ Age + White + Male + Total_Pop + 
                        Black_Pop + Prop_Black + Income + 
                        Zip_Present + Gang + ViolentCase, 
                    data = train, 
                    importance = TRUE, 
                    ntree = 1500,
                    na.action = na.roughfix)

Model 1 ROC

Evaluation

  • Context, context, context
  • The final confusion matrix
  • Comparison to logistic regression and LS/CMI

Implementation

  • 2,300 early releases